Avoiding Missing Data Biases in Phylogenomic Inference: An Empirical Study in the Landfowl (Aves: Galliformes).

نویسندگان

  • Peter A Hosner
  • Brant C Faircloth
  • Travis C Glenn
  • Edward L Braun
  • Rebecca T Kimball
چکیده

Production of massive DNA sequence data sets is transforming phylogenetic inference, but best practices for analyzing such data sets are not well established. One uncertainty is robustness to missing data, particularly in coalescent frameworks. To understand the effects of increasing matrix size and loci at the cost of increasing missing data, we produced a 90 taxon, 2.2 megabase, 4,800 locus sequence matrix of landfowl using target capture of ultraconserved elements. We then compared phylogenies estimated with concatenated maximum likelihood, quartet-based methods executed on concatenated matrices and gene tree reconciliation methods, across five thresholds of missing data. Results of maximum likelihood and quartet analyses were similar, well resolved, and demonstrated increasing support with increasing matrix size and sparseness. Conversely, gene tree reconciliation produced unexpected relationships when we included all informative loci, with certain taxa placed toward the root compared with other approaches. Inspection of these taxa identified a prevalence of short average contigs, which potentially biased gene tree inference and caused erroneous results in gene tree reconciliation. This suggests that the more problematic missing data in gene tree-based analyses are partial sequences rather than entire missing sequences from locus alignments. Limiting gene tree reconciliation to the most informative loci solved this problem, producing well-supported topologies congruent with concatenation and quartet methods. Collectively, our analyses provide a well-resolved phylogeny of landfowl, including strong support for previously problematic relationships such as those among junglefowl (Gallus), and clarify the position of two enigmatic galliform genera (Lerwa, Melanoperdix) not sampled in previous molecular phylogenetic studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A molecular genetic time scale demonstrates Cretaceous origins and multiple diversification rate shifts within the order Galliformes (Aves).

The phylogeny of Galliformes (landfowl) has been studied extensively; however, the associated chronologies have been criticized recently due to misplaced or misidentified fossil calibrations. As a consequence, it is unclear whether any crown-group lineages arose in the Cretaceous and survived the Cretaceous-Paleogene (K-Pg; 65.5 Ma) mass extinction. Using Bayesian phylogenetic inference on an a...

متن کامل

A small galliform bird from the Lower Eocene Fur Formation, north-western Denmar

A pair of fossilized imprints of feet represent the first published galliform (landfowl) specimen from the Lower Eocene Fur Formation of northwest Denmark. The specimen is referable to Galliformes due to the presence of a distinctly asymmetric trochlea metatarsi III. The specimen appears distinct from previously described Eocene Galliformes (e.g. Gallinuloididae, Quercymegapodiidae and Paraorty...

متن کامل

Marginal Analysis of A Population-Based Genetic Association Study of Quantitative Traits with Incomplete Longitudinal Data

A common study to investigate gene-environment interaction is designed to be longitudinal and population-based. Data arising from longitudinal association studies often contain missing responses. Naive analysis without taking missingness into account may produce invalid inference, especially when the missing data mechanism depends on the response process. To address this issue in the ana...

متن کامل

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.

In the context of exponential growing molecular databases, it becomes increasingly easy to assemble large multigene data sets for phylogenomic studies. The expected increase of resolution due to the reduction of the sampling (stochastic) error is becoming a reality. However, the impact of systematic biases will also become more apparent or even dominant. We have chosen to study the case of the ...

متن کامل

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 33 4  شماره 

صفحات  -

تاریخ انتشار 2016